I/O virtual memory (IOMMU) support #327

XanClic · 2025-05-30T14:23:55Z

Summary of the PR

This MR adds support for an IOMMU, and thus I/O virtual memory handling.

New Memory Trait: `IoMemory`

Handling I/O virtual memory requires a new interface to access guest memory:
GuestMemory does not allow specifying the required access permissions, which
is necessary when working with MMU-guarded memory.

We could add memory access methods with such a permissions parameter to
GuestMemory, but I prefer to provide a completely new trait instead. This
ensures that users will only use the interface that actually works when working
with (potentially) I/O virtual memory, i.e.:

They must always specify the required permissions,
They cannot (easily) directly access the memory regions, because doing so
generally assumes that regions are long, continuous, and any address in a
given range will be in the same memory region. This is absolutely no longer
the case with virtual memory, which is heavily fragmented into pages.

That is, adding a new trait (IoMemory) allows to catch a lot of potential
mistakes at compile time, which I feel is much better than finding out at
runtime that some place forgot to specify the access permissions.

Unfortunately, this is an incompatible change because we need to decide on a
single guest memory trait that we expect users to primarily use: We can only
have one blanket implementation of e.g. Bytes, and this MR changes that
blanket implementation to be on IoMemory instead of GuestMemory because we
want to prefer IoMemory with its permissions-including interface.

While this MR does provide a blanket implementation of IoMemory for all
GuestMemory, Rust isn’t fully transitive here, so just because we have a
blanket impl IoMemory for GuestMemory and a blanket impl Bytes for IoMemory
doesn’t really implicitly give us an impl Bytes for GuestMemory.

What this means can be seen in virtio-queue (in vm-virtio): It uses trait bounds
like M: GuestMemory only, but then expects to be able to use the Bytes
trait. This is no longer possible, the trait bound must be extended to
M: GuestMemory + Bytes or replaced by M: IoMemory (the latter is what we
want).

Guest Address Type

Another consideration is that I originally planned to introduce new address
types. GuestAddress currently generally refers to a guest physical address
(GPA); but we now also need to deal with I/O virtual addresses (IOVAs), and an
IOMMU generally doesn’t translate those into GPAs, but VMM user space addresses
(VUAs) instead, so now there’s three kinds of addresses. Ideally, all of those
should get their own type; but I felt like:

This would require too many changes from our users, and
You don’t even know whether the address you use on an IoMemory object is an
IOVA or a GPA. It depends on whether the IOMMU is enabled or not, which is
generally a runtime question.

Therefore, I kept GuestAddress as the only type, and it may refer to any of
the three kinds of addresses (GPAs, IOVAs, VUAs).

Async Accesses

I was considering whether to also make memory accesses optionally async. The
vhost-user IOMMU implementation basically needs two vhost-user socket roundtrips
per IOTLB miss, which can make guest memory accesses quite slow. An async
implementation could allow mitigating that.

However, I decided against it (for now), because this would also require
extensive changes in all of our consuming crates to really be useful: Anything
that does a guest memory access should then be async.

I think if we want to add this functionality later, it should be possible in a
compatible manner.

Changes Necessary in Other Crates

vm-virtio

Implementation: https://gitlab.com/hreitz/vm-virtio/-/commits/iommu

As stated above, places that bind M: GuestMemory but expect the Bytes trait
to also be implemented need to be changed to M: GuestMemory + Bytes or
M: IoMemory. I opted for the latter approach, and basically replaced all
GuestMemory instances by IoMemory.

(That is what we want because dropping GuestMemory in favor of IoMemory
ensures that all vm-virtio crates can work with virtual memory.)

vhost

Implementation: https://gitlab.com/hreitz/vhost/-/commits/iommu

Here, the changes that updating vm-memory necessitates are quite marginal, and
have a similar cause: But instead of requiring the Bytes trait, it’s the
GuestAddressSpace trait. The resolution is the same: Switch from requiring
GuestMemory to IoMemory.

The rest of the commits concerns itself with implementing VhostUserIommu and
allowing users to choose to use IommuMemory<GuestMemoryMmap, VhostUserIommu>
instead of only GuestMemoryMmap.

virtiofsd (as one user)

Implementation: https://gitlab.com/hreitz/virtiofsd-rs/-/commits/iommu

This is an example of an actual user. Updating all crates to IOMMU-supporting
versions actually does not require any changes to the code, but enabling the
'iommu' feature does: This feature makes the vhost-user-backend crate require
the VhostUserBackend::Memory associated type (because associated type defaults
are not stable yet), so this single line of code must be added (which sets the
type to GuestMemoryMmap<BitmapMmapRegion>).

Actually enabling IOMMU support is then a bit more involved, as it requires
switching away from GuestMemoryMmap to IommuMemory again.

However, to me, this shows that end users working with concrete types do not
seem to be affected by the incompatible IoMemory change until they want to opt
in to it. That’s because GuestMemoryMmap implements both GuestMemory and
IoMemory (thanks to the blanket impl), so can transparently be used wherever
the updated crates expect to see an IoMemory type.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

All commits in this PR have Signed-Off-By trailers (with
git commit -s), and the commit message has max 60 characters for the
summary and max 75 characters for each description line.
All added/changed functionality has a corresponding unit/integration
test.
All added/changed public-facing functionality has entries in the "Upcoming
Release" section of CHANGELOG.md (if no such section exists, please create one).
Any newly added unsafe code is properly documented.

XanClic · 2025-05-30T14:49:09Z

I know why the code coverage CI check fails, because I (purposefully) don’t have unit tests for the new code.

Why the other tests failed, I don’t know; but I suspect it’s because I force-pushed an update (fixing the CHANGELOG.md link) while the tests where running, maybe SIGTERM-ing them. Looking at the timelines, they all failed (finished) between 16:27:02.985 and 16:27:02.995. (Except the coverage one, which is an actual failure.)

XanClic · 2025-05-30T15:05:27Z

Pushed an update without actual changes (just re-committing the top commit) to trigger a CI re-run. This time, only the coverage check failed (as expected).

XanClic · 2025-07-10T12:40:43Z

Added tests for the new functionality, and rebased on the main branch.

XanClic · 2025-07-11T10:28:18Z

cc @germag

XanClic · 2025-07-30T16:52:35Z

Added support for bitmaps in I/O virtual address space (required for migration).

src/guest_memory.rs

src/io_memory.rs

src/iommu.rs

bonzini · 2025-08-07T15:52:34Z

src/io_memory.rs

    /// Underlying `GuestMemory` type.
    type PhysicalMemory: GuestMemory;
+    /// Dirty bitmap type for tracking writes to the IOVA address space.
+    type Bitmap: Bitmap;


I am not sure I like the design here. The IOVA address space, being virtual, can have aliases. Could you still store the bitmap in the low-level memory region, but with accesses going through IOMMU translation?

What does vhost-user require? Are dirty bitmap accesses done in IOVA or GPA space?

Are dirty bitmap accesses done in IOVA or GPA space?

They’re done in IOVA space, which is why I think we need the bitmap on this level.

(If we continued to store it in e.g. GuestRegionMmap/MmapRegion, it would need a reverse translation, and then keep a mapping of address ranges to bitmap slices instead of just a single bitmap slice linearly covering the entire region.)

They’re done in IOVA space, which is why I think we need the bitmap on this level.

I see. Maybe you could add a bitmap::BS<'a, Self::Bitmap> as another argument that try_access() passes back to the callback? And then IommuMemory::get_slice() can pass that argument to replace_bitmap().

This way, the Iommu controls whether dirty tracking is done in IOVA or GPA space.

You mean putting the responsibility on the try_access() callback to dirty a bitmap slice given to it if it has written anything?

I’m not sure, that does sound more than reasonable, but would require more changes in the callers than just to add the Permissions flag. 🙂

You mean putting the responsibility on the try_access() callback to dirty a bitmap slice given to it if it has written anything?

No, I confused try_access and get_slice sorry. For IoMemory, dirtying is already done it in try_access itself; for Iommu the slice could be returned by translate, that is it would be part of the Iotlb entry?

I’m sorry, I don’t quite follow; currently, dirtying is done only by IommuMemory (in its IoMemory implementation), neither IoMemory nor Iommu.

<IommuMemory as IoMemory>::try_access() dirties the bitmap itself, right.

<IommuMemory as IoMemory>::get_slice() has the VolatileSlice do the work. For this, it replaces the VolatileSlice’s internal bitmap slice (which is initially set by the GuestMemoryRegion) by the right slice in the IOVA space.

I don’t think the IOMMU object should do any of this, and I don’t think it should return slices (if I understand correctly). I think it should only do the translation and not care about the actual memory.

::get_slice() has the VolatileSlice do the work. For this, it replaces the VolatileSlice’s internal bitmap slice (which is initially set by the GuestMemoryRegion) by the right slice in the IOVA space.

To clarify, I was saying it should not be the IommuMemory that decides the address space used by the bitmap - whether IOVA space as vhost wants or GPA space as everyone else (?) wants. I thought Iommu would return the slice in the Iotlb object, but I agree that probably Iommu isn't the right place either.

That said I don't care much, because as long as IommuMemory is hidden behind a feature, it's just a convenience or a sample implementation (and QEMU probably will not use IommuMemory, only IoMemory). Maybe just add a comment that says "NOTE: IommuMemory can only be used if the dirty bitmap tracks accesses in IOVA space".

Maybe just add a comment that says "NOTE: IommuMemory can only be used if the dirty bitmap tracks accesses in IOVA space".

To be precise, it depends on the IOMMU enabled setting: With it enabled, it’s IOVA space, otherwise it’s whatever the address space of the underlying GuestMemory is (for vhost, that would be VUA).

I could speculate on how it works for non-vhost (specifically in-hypervisor) cases; maybe the underlying memory will use GPA in that case, and therefore we can (and have to) track dirty accesses in GPA then. To make that work, we would only need a separate flag to control whether the IOVA dirty bitmap is supposed to be used or not.

But I think the best would be for me not to speculate and just leave this as it is – if such a separate flag would be enough, we can just add it later. For the time being, as you suggest, a comment that specifies the dirty bitmap behavior should be enough.

bonzini

Overall this is good stuff. It also has a lot of overlap with the experimental changes that were necessary in order to use vm-memory in QEMU, which is a very good thing.

I have two main questions:

do we really need an IoMemory or can we change GuestMemory?
if we need an IoMemory, I think the PhysicalMemory: GuestMemory type instead be an AS: GuestAddressSpace? (and likewise for physical_memory()?

I'm marking this as "request changes" because there would be changes either way.

bonzini

Please extract the first three commits, up to "Bytes: Do not use to_region_addr()" in a separate PR.

roypat · 2025-08-11T16:35:05Z

+1 to changing GuestMemory. Could we maybe parameterized by the address space? E.g. GuestMemory<GPA> and GuestMemory<IOVA> (and then indeed introduce separate guest address types for guest physical and iommu addresses)?

bonzini · 2025-08-11T17:02:10Z

and then indeed introduce separate guest address types for guest physical and iommu addresses

No, I don't think that is a good idea. IOVA and GPA are the same concept, though in different address spaces.

My hope when leaving the review was that no changes are needed outside vm-memory, and the interpretation of GuestAddress can be left to whoever creates the GuestMemory.

XanClic · 2025-08-12T10:59:40Z

Could we maybe parameterized by the address space? E.g. GuestMemory and GuestMemory (and then indeed introduce separate guest address types for guest physical and iommu addresses)?

The main practical problem with using different types (as stated in the opening comment) is that users generally don’t even know whether a given address is a GPA or an IOVA. For example, the addresses you get from vrings and vrings descriptors can be either, it just depends on whether the IOMMU is enabled or not. Users should be able to use the same code whether it is on or off, which (I think) wouldn’t really be possible with different types.

Whether to add IoMemory or change GuestMemory

I hope Paolo will add a comment to that effect (we just had a talk); we agreed that adding IoMemory is OK as long as it is guarded under the new iommu feature.

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

(Paolo (maybe half-jokingly) suggested making the implementation of that a linking-time error by accessing undefined references in a hypothetical impl GuestMemory for IommuMemory implementation, which is fun, but I think just guarding the new trait under the iommu feature makes more sense.)

bonzini · 2025-08-12T11:45:48Z

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

Yes, GuestMemory is a nice interface for implementation but clients should switch to IoMemory.

roypat · 2025-08-12T14:49:17Z

I don’t like keeping GuestMemory because I think actually none of its current methods will work with virtual memory, and so I would prefer adding a new interface that makes it clear that these aren’t available at compile time.

Yes, GuestMemory is a nice interface for implementation but clients should switch to IoMemory.

Different question about this: Do we really need the IoMemory trait, or could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)? That would also side-step all the problems of conflicting default implementations (although I'm not sure if the compiler will be clever enough to optimize the no-op iommu setup as well as with the special impl of IoMemory for GuestMemory :/)

XanClic · 2025-08-12T15:32:06Z

could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)?

Doesn’t sound bad, but would be a more “radical change”: Intermediate users like virtio-queue currently use memory types via M: GuestMemory. Converting them to M: IoMemory is straightforward and allows their users to continue to use non-virtual memory types like GuestMemoryMmap with no change needed if they don’t need IOMMU functionality.

If we change patterns like foo<M: Deref<Target: GuestMemory>>(mem: M) to foo<M: Deref<Target = IommuMemory>>(mem: M), all users will need to be changed to use IommuMemory.

roypat · 2025-08-12T16:24:49Z

could consumers just switch to IommuMemory<M: GuestMemory, I: Iommu> (e.g. be parametric by M and I instead of T: IoMemory, and for "no iommu" systems we can have a no-op implementation of Iommu)?

Doesn’t sound bad, but would be a more “radical change”: Intermediate users like virtio-queue currently use memory types via M: GuestMemory. Converting them to M: IoMemory is straightforward and allows their users to continue to use non-virtual memory types like GuestMemoryMmap with no change needed if they don’t need IOMMU functionality.

If we change patterns like foo<M: Deref<Target: GuestMemory>>(mem: M) to foo<M: Deref<Target = IommuMemory>>(mem: M), all users will need to be changed to use IommuMemory.

Mh, potentially this wouldn't be that much churn, because we could have impl<M: GuestMemory> From<M> for IommuMemory<M, NoIommu> { ... }, and then users of, say, virtio-queue would only have to do a &(...).into() when passing memory. And having less traits in vm-memory would be a win for maintainability imo.

Somewhat related question, since I'm seeing the Deref bound there, if M becomes GuestAddressSpace, and the iommu is stored as I: Deref<Target=...> as outlined in #327 (comment), will there still be a need to put the IommuMemory behind a Deref itself?

XanClic · 2025-09-22T12:50:34Z

Can you place somewhere an extra commit to do the great rename GuestMemory->GuestMemoryBackend, IoMemory->GuestMemory? This way we can see how much breakage it causes.

Sure – I’ll can also show what this would mean for the vm-virtio and vhost crates.

@bonzini: Here are some repos (renaming commits marked as “[preliminary]”):

There’s a comment on one of the vm-memory commits about the guest_memory module documentation mentioning “memory consumers (virtio device drivers, vhost drivers and boot loaders etc)”, specifically that these were in-VM users and so should probably continue to use GuestMemory (then GuestMemoryBackend), whereas IoMemory were only for users outside the VM, which would imply that maybe the names should stay as they are.

But reading it again, I’m no longer so sure. I’m bringing this up so someone can make it clear to me :)

“vhost drivers” are not in-VM, but specifically what I mention as outside VM. Don’t know how I could confuse that. Either of the names IoMemory and GuestMemory make sense to me here.
“boot loaders” should indeed largely continue to use physical memory. Renaming IoMemory to GuestMemory will require them to either use GuestMemoryBackend instead (clunky, and probably exactly what renaming is supposed to prevent) or specify the access mode and use get_slices(), for no functional reason. Not great, not terrible.
“virtio device drivers” probably actually do need virtual memory support for IOMMU-enabled devices, so I’m just wrong saying these need access to physical memory. They do need access to the (potentially virtual) device memory address space. On the other hand, it’d be weird to then call this GuestMemory – it isn’t, it specifically is device memory. So we’d want to switch them to the new trait, but the name IoMemory would fit better than GuestMemory still, imho.

So all in all, I understand the switch makes sense to nudge all users to change their code so they can work with virtual memory if that becomes necessary; and the new trait still allows using all existing (physical) types, so it would all remain GuestMemory, nothing would have to use GuestMemoryBackend.

But on the other hand, as far as I can see, in all cases where we need to be able to access virtual memory it seems to be about I/O device memory, so the name IoMemory absolutely makes sense for it, and distinguishes it from all of the guest physical address space, which makes sense to continue to name GuestMemory.

bonzini · 2025-09-24T08:52:59Z

“boot loaders” should indeed largely continue to use physical memory

I'm not sure about that. How to allocate memory is a choice of the VMM, not the boot loader. Suppose a board has RAM at 0..2G and 4G..8G. Whether to allocate it as a single mmap or two should not be the boot loader's choice. If two (which is what something like firecracker would do) you can use GuestMemory in the boot loader. If one you could use your new trait to split the single backend into two regions, and only show an IoMemory to the boot loader.

A boot loader should also make sure to prepare the guest's memory view in such a way that the guest, after view, sees exactly what was prepared by the boot loader. Take as an example the PC's ability to remap the memory below 1MB to the corresponding area below 4GB. If a boot loader uses GuestMemory, it may set up the memory in a way that seems correct, but doesn't match the guest's view. The boot loader must then be able to interact with the chipset to ensure the guest's view is correct; passing it an IoMemory may make the boot loader code more complex, but it will also prevent this kinds of bugs.

Also, CPUs can have their own memory view, which is not the same as either the pure memory backend and not the same as any device. For example on x86 the local APIC is only visible to the CPUs.

So we could call your IoMemory something like GuestMemoryView, but I prefer to go all in and, as you say, nudge towards the transition.

Anyhow, looking at vm-virtio the changes are pretty small (beyond dropping get_slice and get_host_address, which of course are the largest part):

changes to bitmaps (B->Bitmap):

-        <<M as GuestMemory>::R as GuestMemoryRegion>::B: WithBitmapSlice<'a, S = B>,
+        <M as GuestMemory>::Bitmap: WithBitmapSlice<'a, S = B>,

adding permissions:

-        Ok(get_single_slice(mem, addr, length)?
+        Ok(get_single_slice(mem, addr, length, access)?

roypat · 2025-09-24T09:04:15Z

But on the other hand, as far as I can see, in all cases where we need to be able to access virtual memory it seems to be about I/O device memory, so the name IoMemory absolutely makes sense for it, and distinguishes it from all of the guest physical address space, which makes sense to continue to name GuestMemory.

One further place where I want to use the new trait is for the Xen code. With Xen Grant mappings, vm-memory needs to know whether an access from the hypervisor is a read or a write, and issue an ioctl with that information before the memory can be accessed. Currently, this is hacked into VolatileSlice, but if we instead treat GuestMemoryCollection<XenRegion> as IoMemory, then we can use the access permissions to know whether something is a read or a write before we create the volatile slice, and thus can remove all the ioctl calls out of VolatileSlice into the IoMemory implementation for GuestMemoryCollection<XenRegion>, which will help with #342. Or at least I hope. In this scenario, IoMemory sounds like the wrong name to me, but GuestMemory works out.

XanClic · 2025-09-26T15:27:25Z

OK, thanks for the explanations!

I’ve rebased on top of #339 (and because the atomic access implementations now require that get_slices() always returns at least one element with count > 0, I’ve extended the IoMemory::get_slices() comment to say that).

bonzini · 2025-10-03T15:41:59Z

@XanClic needs another rebase 😅

bonzini · 2025-10-03T16:01:08Z

Also, given the intention to rename everything I would not create a separate io_memory module for IoMemory, and would instead create an iommu module for IommuMemory and its friends.

CHANGELOG.md

XanClic · 2025-10-06T16:46:18Z

Update with rebase on main:

Dropped try_access() altogether
Renamed range_accessible() to check_range(), which we already have in GuestMemory
Added IoMemorySliceIterator<'a, B>: Iterator<Item = Result<VolatileSlice<'a, B>>> trait, which implements stop_on_error()
Dropped io_memory.rs, moving everything into guest_memory.rs

bonzini · 2025-10-11T13:40:32Z

src/guest_memory.rs

+///
+/// Returned by [`IoMemory::get_slices()`].
+pub trait IoMemorySliceIterator<'a, B: BitmapSlice>:
+    Iterator<Item = Result<VolatileSlice<'a, B>>> + Sized


Suggested change

Iterator<Item = Result<VolatileSlice<'a, B>>> + Sized

Iterator<Item = Result<VolatileSlice<'a, B>>> + Sized + FusedIterator

Why? Yes, the current implementations do provide that. But is it something we need the trait to require?

I think so, because you want any other implementations to be consistent. It's just a marker.

If another implementation can’t provide FusedIterator for free, this would incur overhead, though.

As far as I understand FusedIterator, users who need it can just use .get_slices()?.fused(), and this should be zero-cost if the concrete type does implement FusedIterator, no?

Yes but I think generally speaking fused behavior is desirable. Chances are that users will expect fused behavior and work only on some backends. (Also I don't see how a IoMemorySliceIterator could magically restart providing values after a None... almost all iterators should be fused IMO).

I don’t know what to say, that’s just wrong. (Edit: “that” being assuming all iterators are fused)

As I see it, Rust as a language has chosen iterators to generally not be fused, and requiring it in this trait would go against that design decision.

You're right, that was not really a great way to put it so let me rephrase.

IoMemorySliceIterator is returned by get_slices(). Users of get_slices() should know what semantics to expect if the iterator starts providing items again after None. The documentation should say that, while making it clear that returning an iterator with fused behavior is acceptable too.

If there is a reasonable description of what to expect, then the current code is fine but the documentation for get_slices() is lacking. I don't think this is the case, the documentation is fine.

But then, returning a FusedIterator should be part of get_slices()'s contract, and + FusedIterator should be in either get_slices() or IoMemorySliceIterator. The former seems unnecessarily pedantic, so I'd put it in the trait.

That is, for unfused iterators to be useful, you need to know when to try calling .next() again and what the result would be. Generally they are used for "resuming" iterators, like std::sync::mpsc::TryIter. I don't think get_slices() should return such a resuming iterator.

OK, I’ll add the FusedIterator bound, but I disagree with this specific reasoning. In short: I still don’t think it’s actually useful, but I think it’s necessary.

That is, for unfused iterators to be useful, you need to know when to try calling .next() again and what the result would be.

I really don’t see why that would be the case. You run .next() until it returns None, then you don’t use the iterator any more. In my experience, that is the absolute vast majority of use cases, and it doesn’t matter how the iterator behaves after the first None.

Consequently, I think it can be fine for the documentation to leave open what next() does once iteration is finished if that is deliberately left implementation-defined. However, of course, clearer documentation covering more edge cases is always better.

As for users requiring fused behavior, FusedIterator’s documentation (as I understand it) says not to rely on bounds, but always use .fuse().

I also disagree with the implication that if the iterator isn’t fused, it must be resuming. As you say, “generally” non-fused iterators could be resuming, not always. Other behavior is perfectly legal if it’s deemed more useful, and I would say largely undefined behavior is also OK if absolutely required for optimization purposes.

Having said all that, what you do make me see is that I should deliberate what behavior I actually do want. Implementation-defined (effectively undefined) behavior comes at a cost, I can’t just choose this by default because I’m lazy: Safe code is allowed to call .next() after the iterator is exhausted, and while that is wrong if you don’t know what to expect, it’s still legal safe code, so it is even more wrong for me to shrug my shoulders and say “whatever, don’t do that”.

Now, if such undefined behavior actually had a good reason to it, e.g. “It’s really important for performance here”, OK, but I don’t have such a reason. The only actual reason I have is that once we add that bound, removing it would be an incompatible change. Not very strong.

So yes, I really should at least consider what behavior makes sense after iterator exhaustion, and specify it. And here I agree that fused behavior is indeed the only behavior that does make sense, so let’s add that bound.

Sounds good, in fact I agree with all that you wrote. :) Including this:

I still don’t think it’s actually useful, but I think it’s necessary.

Thanks for writing it down better than I did.

bonzini · 2025-10-11T13:58:18Z

src/iommu.rs

+    ///
+    /// Disabling the IOMMU switches to pass-through mode, where every access is done directly on
+    /// the underlying physical memory.
+    pub fn set_iommu_enabled(&mut self, enabled: bool) {


Add a getter too?

bonzini · 2025-10-11T13:59:17Z

src/iommu.rs

+    /// Note that the inner `Arc` reference to the IOMMU is cloned, i.e. both the existing and the
+    /// new `IommuMemory` object will share an IOMMU instance.  (The `use_iommu` flag however is
+    /// copied, so is independent between the two instances.)
+    pub fn inner_replaced(&self, inner: M) -> Self {


So if I understand correctly you'd do something like

// `as` is a GuestMemoryAtomic let g = as.lock().unwrap(); let new_iommu = as.load().inner_replaced(new_memory); g.replace(new_iommu);

Maybe we can add an

impl<M: GuestMemory, I: IOMMU> GuestMemoryExclusiveGuard<'_, IommuMemory<M, I>> { /// Replace the memory map in the `GuestMemoryAtomic` that created /// the guard with the new memory map, `map`, while keeping the IOMMU. /// This method consumes the guard. pub fn replace_backend(self, map: M) { let old = self.parent.inner.0.load(); let new = old.inner_replaced(map); self.replace(new); } }

(and TBH I'd love for inner_replaced to be pub(crate) but maybe that's asking too much).

Yes, that’s the pattern.

(In practice, inner_replaced() is called through a helper trait (VirtualMemory) that vhost-user-backend implements on both IommuMemory and GuestMemoryMmap, which provides a with_physical_memory() method that uses inner_replaced() for IommuMemory and just returns the new memory for GuestMemoryMmap.)

Why do you want to add that method?

Also, it isn’t generic across IoMemory, so it could only be used specifically for IommuMemory. I suppose we could add a trait Foo<M: IoMemory> with this method, and implement it for multiple GuestMemoryExclusiveGuards<M>. Then, vhost-user-backend could drop that with_physical_memory() method (but still needs the trait anyway). That alone doesn’t seem worth the effort here to me, but that’s why I’m wondering why you would like to have that method.

Why do you want to add that method?

Because right now GuestMemoryAtomic focuses on replacing the whole GuestMemory, but there is a usecase for replacing the backend only.

Also, it isn’t generic across IoMemory, so it could only be used specifically for IommuMemory

Yes, it's an extra impl only for IommuMemory.

That alone doesn’t seem worth the effort here to me, but that’s why I’m wondering why you would like to have that method

More for clarity/documentation than anything else. inner_replaced was a bit mysterious (maybe it's just the name, with_replaced_backend or anything along those lines would help me too).

I see now that it can't be pub(crate) though!

there is a usecase for replacing the backend only.

True, but if we wanted to really provide for that use case, we’d need a generic IoMemory implementation (at least for all types in vm-memory).

inner_replaced was a bit mysterious (maybe it's just the name, with_replaced_backend or anything along those lines would help me too).

Sure, I’d be more than happy to (and also rename .inner() to .get_backend()). I’ll definitely do this.

Ok, let's do the rename and agree to disagree on the rest. :)

bonzini · 2025-10-11T14:05:05Z

src/mmap/mod.rs

+    /// The returned reference can be cloned to construct a new `GuestRegionMmap` with a different
+    /// base address (e.g. when switching between memory address spaces based on the guest physical
+    /// address vs. the VMM userspace virtual address).
+    pub fn get_mapping(&self) -> &Arc<MmapRegion<B>> {


I'd rather call this get_mmap() and do the clone itself returning an Arc<MmapRegion<B>>.

bonzini

Just a few requests, but mostly I think it's ready.

bonzini · 2025-10-11T14:13:03Z

src/iommu.rs

 /// space.
-#[derive(Debug, Default)]
+///
+/// Note on memory write tracking (“logging”):


I'll add a note here for simplicity, but it applies to the whole commit.

There are cases (e.g. QEMU :)) in which you want to track at the backend level independent of whether you have an IOMMU or not. The vhost case is the weird one due to its usage of VMM userspace addresses. I'd prefer that the comments and commit messages put less emphasis on this, explaining that there are simply two levels of bitmaps.

Related to this, I'm not super enthusiastic about forcing the usage of Arc<> for the bitmap, because having an Arc<()> in the simple case is a bit weird. However, all the derefs should be optimized away so I guess it's fine.

I'd prefer that the comments and commit messages put less emphasis on this, explaining that there are simply two levels of bitmaps.

I understood this type to be specific to vhost’s model, hence this note saying so. Now, it will probably be easy to modify it slightly to allow for QEMU’s model as well, basically IommuMemorySliceIterator::do_next() would just have to skip replacing the VolatileSlice’s bitmap. But I don’t see the point in doing that now, as I don’t have the knowledge if this is really what’s needed (and I can’t test it), and such a change could be a compatible one anyway[1].

So I don’t know what you would like to change specifically. Of course I can make it clear in the commit message that this is just for the vhost case, but I’m a bit lost beyond that.

[1] Add a method disable_virtual_bitmap(), pass this state on to all IommuMemorySliceIterators having them skip replacing the bitmap, and this note would be modified to say “If you don’t want that, call disable_virtual_bitmap()”. Yes, if “not wanting that” is the default case, that’s a bit weird, but not the end of the world. But doing anything else (e.g. implement that functionality now, adding a boolean to IommuMemory::new()) would require someone to vouch that that’s really what’s needed and that it would really be useful.

I understood this type to be specific to vhost’s model, hence this note saying so.

It is not used by QEMU but it may be used by other users than vhost. This handling of dirty bitmap is not even required for IommuMemory; I just want to make it clear that it's vhost that needs to have the bitmaps organized this way. IommuMemory allows that but it's possible to use IommuMemory while having bitmaps only at the physical address level.

But doing anything else (e.g. implement that functionality now, adding a boolean to IommuMemory::new()) would require someone to vouch that that’s really what’s needed and that it would really be useful.

Absolutely. All that I'm asking is to not "discount" the case where the IommuMemory's bitmap is () in the documentation.

Of course I can make it clear in the commit message that this is just for the vhost case

Is it clearer now?

I just want to make it clear that it's vhost that needs to have the bitmaps organized this way.

I.e. modify the comment to not just say “this type should only be used when this is the desired behavior” but explicitly something like “This is specifically the vhost-user model”?

All that I'm asking is to not "discount" the case where the IommuMemory's bitmap is () in the documentation.

Hard, because the bitmap between the back-end and IommuMemory must be the same type, or we couldn’t replace it in VolatileSlice. Would you like the bitmap to be wrapped in Option<>?

Hard, because the bitmap between the back-end and IommuMemory must be the same type, or we couldn’t replace it in VolatileSlice.

Ouch, I see. Yeah, change the comment so that others don't fall into the same mistake as me and make it clearer that IommuMemory wants the two-bitmap scenario.

Do you think it would be useful to have the bitmap be optional?

If you have any ideas we can have a separate PR before the 0.18.0 release. Overall for IommuMemory it's okay, as the whole struct is just a convenience.

OK, makes sense.

The existing `GuestMemory` trait is insufficient for representing virtual memory, as it does not allow specifying the required access permissions. Its focus on all guest memory implementations consisting of a relatively small number of regions is also unsuited for paged virtual memory with a potentially very lage set of non-continuous mappings. The new `IoMemory` trait in contrast provides only a small number of methods that keep the implementing type’s internal structure more opaque, and every access needs to be accompanied by the required permissions. Signed-off-by: Hanna Czenczek <[email protected]>

Rust only allows us to give one trait the blanket implementations for `Bytes`. We want `IoMemory` to be our primary external interface becaue it has users specify the access permissions they need, and because we can (and do) provide a blanket `IoMemory` implementation for all `GuestMemory` types. Also, while `IoMemory` (as the more general trait) only has a restricted interface when compared to `GuestMemory`, this interface is enough to implement `Bytes`; notably, accesses to `IoMemory` require specifying the access mode, which is naturally trivial for `Bytes` methods like `read()` or `write()`. Signed-off-by: Hanna Czenczek <[email protected]>

We want a trait like `GuestAddressSpace` for `IoMemory`, but just duplicating it into an `IoAddressSpace` trait is not so easy: We could rename the current `GuestAddressSpace` to `IoAddressSpace` and require `M: IoMemory` (instead of `M: GuestMemory`), and then define `GuestAddressSpace` as: ```rust pub trait GuestAddressSpace: IoAddressSpace<M: GuestMemory> {} impl<AS: IoAddressSpace> GuestAddressSpace for AS where AS::M: GuestMemory, {} ``` But doing just this would break all existing `GuestAddressSpace` users, as they’d now need to import `IoAddressSpace` to use `memory()`. (Re-)Adding `GuestAddressSpace::memory()` as ```rust fn memory(&self) -> <Self as IoAddressSpace>::T { IoAddressSpace::memory(self) } ``` also doesn’t (just) work, as it gets the compiler confused which `memory()` to use (between `GuestAddressSpace` and `IoAddressSpace`), so the `IoAddressSpace::memory()` method would need to be called differently. However, I would find that a bit silly, and it would also then later require changes if the user wants to switch from `GuestMemory` to `IoMemory`. Instead just changing the `GuestAddressSpace::M: GuestMemory` requirement to `M: IoMemory` seems easier: - All callers that just use the `Bytes` interface remain completely unchanged, - It does break users that actually need the `GuestMemory` interface, but from what I have seen, that is only the case for `vhost::vhost_kern`. There, we can simply require that `<AS as GuestAddressSpace>::M: GuestMemory`. Signed-off-by: Hanna Czenczek <[email protected]>

This simply makes `GuestMemoryAtomic` more general. (However, this change requires the preceding commit that relaxed the `GuestAddressSpace::M` requirement from `GuestMemory` to `IoMemory`.) Signed-off-by: Hanna Czenczek <[email protected]>

The Iommu trait defines an interface for translating virtual addresses into addresses in an underlying address space. It is supposed to do so by internally keeping an instance of the Iotlb type, updating it with mappings whenever necessary (e.g. when actively invalidated or when there’s an access failure) from some internal data source (e.g. for a vhost-user IOMMU, the data comes from the vhost-user front-end by requesting an update). In a later commit, we are going to provide an implementation of `IoMemory` that can use an `Iommu` to provide an I/O virtual address space. Note that while I/O virtual memory in practice will be organized in pages, the vhost-user specification makes no mention of a specific page size or how to obtain it. Therefore, we cannot really assume any page size and have to use plain ranges with byte granularity as mappings instead. Signed-off-by: Hanna Czenczek <[email protected]>

This `IoMemory` type provides an I/O virtual address space by adding an IOMMU translation layer to an underlying `GuestMemory` object. Signed-off-by: Hanna Czenczek <[email protected]>

The vhost-user-backend crate will need to be able to modify all existing memory regions to use the VMM user address instead of the guest physical address once the IOMMU feature is switched on, and vice versa. To do so, it needs to be able to modify regions’ base address. Because `GuestMemoryMmap` stores regions wrapped in an `Arc<_>`, we cannot mutate them after they have been put into the `GuestMemoryMmap` object; and `MmapRegion` itself is by its nature not clonable. So to modify the regions’ base addresses, we need some way to create a new `GuestRegionMmap` referencing the same `MmapRegion` as another one, but with a different base address. We can do that by having `GuestRegionMmap` wrap its `MmapRegion` in an `Arc`, and adding a method to return that `Arc`, and a method to construct a `GuestRegionMmap` object from such a cloned `Arc.` Signed-off-by: Hanna Czenczek <[email protected]>

Without an IOMMU, we have direct access to guest physical addresses (GPAs). In order to track our writes to guest memory (during migration), we log them into dirty bitmaps, and a page's bit index is its GPA divided by the page size. When it comes to vhost-user, however, and we use an IOMMU, we no longer know the GPA, instead we operate on I/O virtual addresses (IOVAs) and VMM user-space addresses (VUAs). Here, the dirty bitmap bit index is the IOVA divided by the page size. `IoMemory` types contain an internal "physical" memory type that (in case of vhost-user) operates on these VUAs (`IoMemory::PhysicalMemory). Any bitmap functionality that this internal type may already have (e.g. `GuestMemoryMmap` does) cannot be used for vhost-user dirty bitmap tracking with an IOMMU because they would use the VUA, but we need to use the IOVA, and this information is not available on that lower layer. Therefore, `IoMemory` itself needs to support bitmaps separately from its inner `PhysicalMemory`, which will be used when the IOMMU is in use. Add an associated `IoMemory::Bitmap` type and add a bitmap object to `IommuMemory`. Ensure that writes to memory dirty that bitmap appropriately: - In `try_access()`, if write access was requested, dirty the handled region of the bitmap after the access is done. - In `get_slice()`, replace the `VolatileSlice`'s bitmap (which comes from the inner `PhysicalMemory`) by the correct slice of our IOVA bitmap before returning it. Signed-off-by: Hanna Czenczek <[email protected]>

This commit also adds the iommu feature to the coverage_config feature list. (I left the aarch64 coverage value unchanged; I cannot find out how to get the current value on my system, and it isn’t include in CI.) Signed-off-by: Hanna Czenczek <[email protected]>

Document in DESIGN.md how I/O virtual memory is handled. Signed-off-by: Hanna Czenczek <[email protected]>

Signed-off-by: Hanna Czenczek <[email protected]>

XanClic · 2025-10-17T12:10:16Z

Addressed Paolo’s comments:

Require FusedIterator for IoMemorySliceIterator (and implement it for IommuMemorySliceIterator)
Rename IommuMemory.inner to .backend; rename ::inner_replaced() to ::with_backend_replaced(), and ::inner() to ::get_backend()
Add IommuMemory::get_iommu_enabled()
Rename GuestRegionMmap::get_mapping() to ::get_mmap() and have it return a cloned Arc<_> instead of a reference to the Arc<_>
In the bitmap commit, put more emphasis on the fact that this implementation is vhost-user-specific (both commit message and explanatory comment)

XanClic force-pushed the iommu branch from 1d5f89f to 3757360 Compare May 30, 2025 14:24

XanClic force-pushed the iommu branch from 3757360 to 21d0cd9 Compare May 30, 2025 14:50

XanClic force-pushed the iommu branch 3 times, most recently from 5ee020b to 4104d5f Compare July 10, 2025 12:23

XanClic changed the title ~~[DRAFT] I/O virtual memory (IOMMU) support~~ I/O virtual memory (IOMMU) support Jul 10, 2025

XanClic marked this pull request as ready for review July 10, 2025 12:39

XanClic requested review from ShadowCurse, alexandruag, bonzini, jiangliu and roypat as code owners July 10, 2025 12:39

XanClic force-pushed the iommu branch from 4104d5f to 37c76c1 Compare July 30, 2025 16:51

XanClic force-pushed the iommu branch from 37c76c1 to 37686ac Compare July 30, 2025 16:59

bonzini reviewed Aug 7, 2025

View reviewed changes

bonzini requested changes Aug 7, 2025

View reviewed changes

bonzini mentioned this pull request Aug 12, 2025

GuestMemory returns pointers but does not guarantee their validity #332

Open

bonzini modified the milestones: vm-memory 1.0, vm-memory 0.18.0 Sep 8, 2025

XanClic force-pushed the iommu branch from c0b3e19 to c2eda2f Compare September 26, 2025 15:20

XanClic force-pushed the iommu branch 2 times, most recently from f4889ed to 4671a49 Compare October 6, 2025 16:19

bonzini reviewed Oct 6, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

XanClic force-pushed the iommu branch 2 times, most recently from bb70b4f to c955ffd Compare October 6, 2025 16:45

bonzini reviewed Oct 11, 2025

View reviewed changes

bonzini requested changes Oct 11, 2025

View reviewed changes

XanClic added 11 commits October 17, 2025 10:05

Allow any IoMemory for GuestMemoryAtomic

ddcbe85

This simply makes `GuestMemoryAtomic` more general. (However, this change requires the preceding commit that relaxed the `GuestAddressSpace::M` requirement from `GuestMemory` to `IoMemory`.) Signed-off-by: Hanna Czenczek <[email protected]>

Add IommuMemory

d3fe786

This `IoMemory` type provides an I/O virtual address space by adding an IOMMU translation layer to an underlying `GuestMemory` object. Signed-off-by: Hanna Czenczek <[email protected]>

DESIGN: Document I/O virtual memory

d86c1cb

Document in DESIGN.md how I/O virtual memory is handled. Signed-off-by: Hanna Czenczek <[email protected]>

CHANGELOG: Add I/O virtual memory entry

cd57411

Signed-off-by: Hanna Czenczek <[email protected]>

XanClic force-pushed the iommu branch from e2815d0 to cd57411 Compare October 17, 2025 11:55

	Iterator<Item = Result<VolatileSlice<'a, B>>> + Sized
	Iterator<Item = Result<VolatileSlice<'a, B>>> + Sized + FusedIterator

I/O virtual memory (IOMMU) support #327

Are you sure you want to change the base?

I/O virtual memory (IOMMU) support #327

Uh oh!

Conversation

XanClic commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the PR

New Memory Trait: IoMemory

Guest Address Type

Async Accesses

Changes Necessary in Other Crates

vm-virtio

vhost

virtiofsd (as one user)

Requirements

Uh oh!

XanClic commented May 30, 2025

Uh oh!

XanClic commented May 30, 2025

Uh oh!

XanClic commented Jul 10, 2025

Uh oh!

XanClic commented Jul 11, 2025

Uh oh!

XanClic commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bonzini Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bonzini left a comment

Choose a reason for hiding this comment

Uh oh!

bonzini left a comment

Choose a reason for hiding this comment

Uh oh!

roypat commented Aug 11, 2025

Uh oh!

bonzini commented Aug 11, 2025

Uh oh!

XanClic commented Aug 12, 2025

Uh oh!

bonzini commented Aug 12, 2025

Uh oh!

roypat commented Aug 12, 2025

Uh oh!

XanClic commented Aug 12, 2025

Uh oh!

roypat commented Aug 12, 2025

Uh oh!

XanClic commented Sep 22, 2025

Uh oh!

bonzini commented Sep 24, 2025

Uh oh!

roypat commented Sep 24, 2025

Uh oh!

XanClic commented Sep 26, 2025

Uh oh!

bonzini commented Oct 3, 2025

Uh oh!

XanClic commented May 30, 2025 •

edited

Loading

New Memory Trait: `IoMemory`

bonzini Aug 12, 2025 •

edited

Loading

XanClic Oct 15, 2025 •

edited

Loading

bonzini Oct 15, 2025 •

edited

Loading

bonzini Oct 15, 2025 •

edited

Loading

XanClic Oct 15, 2025 •

edited

Loading